Indonesian Journal of Electrical Engineering and Computer Science 


Vol. 27, No. 1, July 2022, pp. 504~512 


ISSN: 2502-4752, DOI: 10.1159 1/ijeecs.v27.i1 .pp504-5 12 o 504 


Augmentation of contextual knowledge based on domain 
dominant words for IoT applications interoperability 


Prakash Shanmurthy!, Poongodi Thangamuthu!, Balamurugan Balusamy!, Seifedine Kadry? 
‘School of Computing Science and Engineering, Galgotias University, Greater Noida, India 
Department of Applied Data Science, Noroff University College, Oslo, Norway 


Article Info 


ABSTRACT 


Article history: 


Received Apr 6, 2022 
Revised May 5, 2022 
Accepted May 20, 2022 


Keywords: 


Internet of things 

Natural language processing 
Semantic web 

Web ontology language 


Semantic web technology is adapted to the internet of things (IoT) for web- 
based applications to globally connect the services. Web ontology language 
(OWL) domain ontology is a powerful machine-readable language for 
domain knowledge representation. The developer stored the IoT application 
relevant ontology in a repository or catalogue. Hence, IoT application- 
related ontology files are available for reuse, but many of the IoT 
application-relevant ontology files are publicly not available or inaccessible. 
The proposed idea is to extract the contextual knowledge of IoT applications 
that contain inaccessible ontology files. The context-wise specific domain 
IoT applications are not obtainable, hence respective ontology-based 
research papers are identified and their frequent terms are computed. The 
selected contextual dominant frequent terms from the transport domain are 
passed into the skip-gram flavour of word2vector modelled natural language 


processing (NLP) corpus which produces most similar terms. The domain 
experts select the appropriate terms to annotate in OWL ontology for 
contextual knowledge augmentation. Finally, 1422 contextual terms were 
generated based on dominant terms of selected IoT applications. 
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1. INTRODUCTION 

Semantic web ontology's role is to annotate the atomic web concepts and relations in machine- 
readable form which generates the inference and provides interoperability for multiple domains. Researchers 
utilize the power of web ontology language (OWL) ontology for Internet of Things (IoT) related applications 
to virtually represent the sensor names, relationships between sensors, sensor-generated values, and relevant 
protocols in an unstructured way. Natural language processing (NLP) has two major parts, namely natural 
language understanding (NLU) and natural language generation (NLG). The NLU involves mapping the 
given input into required representation and analysing various aspects of language. The NLU is much harder 
than NLG and in specific NLP corpus efficiency is completely dependent on the input text file size. The 
word2vec is a popular technique to generate word embedding which contains two architectures namely 
continuous bag of words (CBOW) and skip-gram. The CROW model generates focus words based on context 
words which require a small corpus with fast training for frequent words. The skip-gram model generates 
context words based on focus words to explore relevant contextual words. The CROW model dataset training 
is made by the negative sampling method, and skip-gram model training is made by the Hierarchical Softmax 
method [1]. 
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This article concentrates on designing a knowledge augmentation methodology to extract the 
knowledge of unavailable ontology-related IoT applications. This methodology will support the reusability of 
existing IoT application knowledge in further use. Section 2 deals with the related work of different ontology 
construction methodologies. Section 3 proposes the knowledge augmentation methodology. Section 4 
explains the experimental analysis of transport domain skip-gram model corpus along with dominant words. 
Section 5 describes the result and discussion part. Finally, section 6 presents the conclusion and future work. 


2. LITERATURE REVIEW 

All text contents are converted into concepts and relations using hybrid supervision and neural 
network. In this work, a standard pre-trained neural network is not utilized [2]. The various levels of an 
assertion like substance-level, structure-level, intention-level, and situation-level are taken into consideration 
for ontology construction [3]. Building topology ontology (BOT)was created and linked with world wide 
web consortium (W3C) that describes catalogues, IoT devices, and sensors also demonstrated for web-based 
applications [4]. Fuzzy ontology was designed based on fuzzy rules and fuzzy metrics for the medical 
domain [5]. Development of Industry ontology from unstructured text fully depends on domain expert’s 
support. Automation and Intelligence techniques are not utilized in ontology design methodology [6]. 
Ontology alignment is applied for entities of different ontologies. Once the domain-related standard words 
are framed, constructing a new ontology concept with some similar concept names is quite easier. Dismantle 
the ontology into terms for finding similar words of the existing terms [7]. 

Various ontology design methodologies are analysed and waste management ontology is 
constructed with the parameter of reuse, interoperability, and knowledge acquisition, but how the 
interoperability is achieved is not proved [8]. The learning ontology is constructed using pre-processed 
electronic textbooks and NLP techniques [9]. Superscale ontology and real-time ontology were framed to 
reduce the ontology search time. It provides multi-domain correlation and multi-domain intercommunication. 
Instead of machine learning techniques, the adaptive filter algorithm was introduced to integrate the domain 
ontology [10]. The multiple ontology semantic reasoning is not possible instead of single ontology semantic 
reasoning. Using the deep learning technique, new inference rules were found based on many semantic 
networks which are formed by ontology triples and relations [11]. During the software testing phase, 
ontology solves the knowledge silo problem which contains the software testing process information and 
failure details [12]. The cross-domain knowledge is integrated into multi-aspect ontology which supports the 
decision system [13]. The semantic virtualization technique suggests the removal of the vertical barrier 
between various IoT application standards which leads to data acquisition plans [14]. The author proposed an 
ontology learning algorithm that develops the ontology from the property graph, aligns the developed 
ontology corresponding to domain ontology, and automated mapping performed on relational to RDF model 
and non-relational to resource description framework (RDF) model. The drawback of this model is the way 
of addressing terminological heterogeneity issues because the string similarity is computed using two models 
only [15]. The knowledge extraction methodology is proposed to retrieve the domain knowledge from the 
freebase RDF dumps, this methodology addressed the challenges while retrieving the triples from freebase 
triples. The main drawback of this methodology is that the intelligent technique is not utilized for the 
extraction of objects from triples [16]. The proposed methodology tries to remove semantic, syntactic, and 
structural heterogeneity of homogeneous domain databases using a data turn and query turn scheme [17]. 

In the OntoKhoj model, performed functions are ontology crawl, ontology classification, and 
ontology rank based on the semantic web [18]. Ontology is developed using the method of ontology mapping 
and merging in which instead of keywords entire ontology is used as an input query [19]. Ontodia is an open- 
source JavaScript tool that supports to visualization of complex ontology for learning purposes [20]. 
OntoSearch is a search engine based on Java server pages (JSP) and Jena technologies which include the 
utility search ontology using keywords and visualizing the ontology elements [21]. Nowadays, a large 
amount of text is available as unstructured data, semi-structured data, and web pages. The relational 
extraction model explores the feature multiple relation extraction three label methods (i.e., entity category, 
relation category, and relation condition) [22]. The required context text content is extracted from the original 
cover text using the adaptive binary coding method [23]. The author addresses the problem of parallel 
computation-based classification method and it is difficult to adopt another platform because of the various 
requirement of users. Platform independent parallel classification method is performed by OWL ontologies 
with parallel reasoning [24]. 

The ontological concepts are extracted as terms, then these frequent ontological terms are passed 
into the NLP corpus that generates similar terms and these terms are clustered using the K-means algorithm. 
But in several IoT applications, ontology is publicly not available in the ontology catalogue [25]. 
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3. PROPOSED METHOD 

Figure 1. describes the knowledge augmentation methodology which produces the similarity words 
based on dominant words. The dominant terms in the input are obtained from research articles that are related 
to IoT based ontology files that are inaccessible. The index, token, Dictionary_count, word_count, lower 
case, upper case, character, and underscore is respectively referred to as ‘I’, ‘TK’, DC, WC, LC, UC, ch and 
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Figure 1. Knowledge Augmentation Methodology 


= Step 1 (Loading of text file): The domain-relevant research articles are loaded for pre-processing. 

= Step 2 (Pre-processing): The research articles are converted into strings; where the author citations are 
eliminated. Then the digits punctuations and whitespace are removed from the input file. Figure 2. 
Portrays the removal of author citations. 

x Step 3 (Term identification): Strings are changed into lower case except for the multiword camel case 
and pascal case. The Figure 3. Illustrates identification of single words and multiword with required 
template. 

z Step 4 (Dominant word computation): The frequencies of unique words are computed among the set of 
research articles whose ontology is publicly unavailable or inaccessible under specific contexts in the 
transport domain. The computed high-frequency words are considered dominant words which are 
described in Figure 4. 

= Step 5 (Word-to-vector): The research articles are D = {D,,...,D,,}, Where 'n' is the total number of the 
research document. Similar terms generated based on ‘D’ article dominant input terms are 
T = (Ta Tim Tit jt) Tim Tni =» Tam} Where ‘m’ represents the total number of similarity words 
for specific domain research articles. Let generalize the terms mentioned above into 
Tw = {Tw Tw» +» Twp} Where m represents the total number of terms. The cosine similarity between 
two words is calculated using (1). 


.b 
cos( 0) =a ew (1) 


= Step 6 (Group of terms transformed as ontology concepts): The previous step 5 generated similar terms 
that are grouped and represented as ontological concepts say Cw = {Cy,, Cw,» =» Cn d+ 
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Pseudocode: Removal of citation 

Input: Text File 

Output: Text File without author citation 
start: 

Convert input string into python tokens. 
Initialize I=0 
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Pseudocode: Case Folding 
Input: List with python Token’s 
Output: List? tokens are LC 


except for camel case and pascal case 


Initialize I=0, I2=1 
for TK[I] until end_of _list 
if TK [I] = String then 
for TK_Char in TK[1:] 
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Pseudo code: Dominant Word Count 
Input: List with python Token’s 

Output: Dictionary with word frequency 
start 

Initialize DC=NULL 

for TK[I] until end_of _list 


if TK_Char{I2] = UC then nf 
if TKI] (‘then Append TK_Char{12) in list2 Th in List 
if TK[1+1] = digit then _ break the iteration if TK [I] = LC then 
Skip below statements else if TK [I] =LC then 
: Convert TK_Cha{I2] into LC if TK [I] = LC then 
endif Append TK_Char[l2] in list? area 
for TK[I]="(‘ until TK[J+1]=")' then increment 12 
f endfor endif 
if TK[H]] = string then elseif TK [I] = *-’ then ; 
P = Remove the hyphen endif 
if TK[I+2] = digit then elseif TK [= then oat 
if TK[I+3] = ‘(‘then T=Individual token US position 
epe if TK_Char [I2] = ‘_’ then else 
Remove TKs from I to 1+3 in list Remove US DC we l 
endif After US convert ch into LC (TRO (TKI 
alse for TK_Char in TK{1: ] 
endif Remove US ; z 
i mie if TK_Char[12] = UC then 
þndif elseif TK_Char [I2] =‘: then Append TK_Char[I2] in list2 
endfor Replace the colon with whitespace : 
elseif TK_Char [12] =LC then endif 
endif |Append TK[I] in list2 endfor 
increment I endfor 
endfor endfor 
end end 
Figure 2. Pseudeo code for removal Figure 3. Pseudeo code for case Figure 4. Pseudeo code for 
of citation folding dominant word count 


4. EXPERIMENTAL ANALYSIS 

The NLP corpus is created using the transport domain-related 1000 research articles which are 
downloaded from reputed journals under the context of aircraft, airport, ambulance, bicycle, bike, bus, car, 
cargo ship, crane, electric vehicle, electric vehicle charging, flight, gps, parking, road traffic, train, vehicle. 
The article's citation, an article template, reference, and bibliography are removed from the research papers to 
attain the efficient NLP corpus. The 15 lakh words are taken for training the skip-gram model neural 
network. The multi-words are converted into appropriate words by using a case-folding algorithm. According 
to Figure 3. Case Folding pseudo code, the following rules are applied; i) rule 1 the multiword camelCase 
multiword 'deviceNode’ is not changed, ii) rule 2 the pascal case multiword 'DeviceNode' also not changed, 
iii) rule 3 the kebab-case multiword 'device-node' is converted into camel case multiword 'deviceNode’, iv) 
Rule 4 the RDF label 'skos:altLabel’ converted into single words namely 'skos' and altLabel, and v) rule 5 the 
upper snake case multiword 'HAS_ NEXT' is transformed into pascal case multiword ‘HasNext’. 

Table 1. describes the transport domain-related context and corresponding dominant words. The 
airport, ambulance, cargo ship, and road traffic have the same dominant word "system". The high-frequency 
three words are illustrated in the table format along with high-frequency words. The "Aircraft" has the 
following high-frequency words: information, datum, aircraft, assembly, network, management, component, 
power, ot, inventory, application, process, control, technology, battery, and operation. The "Airport" has the 
following high-frequency words system, sensor, device, node, address, airport, ot, network, information, 
server, baggage, lora, send, agent, strategy, rfid, service, tag, passenger, base, time, parking, design, location 
and architecture, reader, and communication. The "Ambulance" has the following high-frequency words: 
system, patient, datum, ambulance, device, sensor, ot, health, network, information, smart, paramedical, 
disease, control, and base. These dominant terms were passed into the skip-gram algorithm that generate 
1264 words. The context-wise generated word counts are ambulance (40), bicycle (40), bus (48), electric 
vehicle (48), aircraft (56), bike (56), crane (56), car (64), road traffic (64), airport (72), GPS (72), flight (88), 
parking (88), rapid transit (88), train (88), transport (88), cargo ship (96), and vehicle (112). The percentatge 
of reuseable concepts is calculated using (2). 

Reuseability = 1 (2) 

Let ‘q’ is the number of concpets matched with standarded IoT related ontology and ‘m’ represents 
the total number of terms. The matching between ontology concpets made by SPARQL query language. 
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Table 1. List of dominant word name 


Transportation-related context 


Dominant Word Names 


Aircraft information, datum, aircraft, assembly 
Airport system, sensor, device, node 
Ambulance system, patient, datum, ambulance. 
Bicycle document, smart, bicycle, city. 

Bike bike, datum, share bicycle. 

Bus bus, time, network, datum. 

Car car, system, use, datum. 

Cargo Ship system, service, use, cloud. 

Crane datum, digital, twin, crane. 


Electric Vehicle 
Electric Vehicle Charging 


ev, system, use, vehicle. 
ev, charge, battery, use. 


Flight av, vs, datum, ot. 

GPS technology, system, ot, gps. 

Parking parking, car, system, user. 

Road Traffic system, traffic, transportation, datum. 
Train tensor, datum, ot, flow. 

Vehicle vehicle, system, accident, ot. 
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5. RESULT AND DISCUSSION 

Figure 5. shows the ontological terms inputs and dominant terms input percentage. In the context of 
aircraft, cargo ship, and flight, the bar chart contains the high percentage of dominant words which shows a 
greater number of IoT applications ontology there are not accessible. The transport, road traffic, bus, and 
ambulance context show less percentage of dominant words which indicates the mentioned context IoT 
application ontologies that are accessible. 
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Figure 5. Comparison of ontological terms inputs and dominant words inputs 


Figure 6. Portrays the number of available ontologies concepts in the transportation domain. The 
16300 contextual terms were generated based on ontological inputs. The obtained contextual ontological 
concepts based on dominant words is 1422. The context transport, car, and road traffic have more number 
concepts compared to other contexts. The contexts aircraft, flight, and cargo ships have a smaller number of 
concepts. Figure 6. Clearly shows no correlation between the number of ontology concepts based on 
ontological inputs and the number of ontology concepts concerning dominant words inputs. It seemly shows 
all the transport context contains inaccessible ontology that is in which the semantic web best practice 
guidelines are not followed. 
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Figure 6. Count of ontology concepts based on ontological inputs and dominant words inputs 


Table 2. shows a set of contextual words corresponding to interoperability percentage. The 
interoperability parameter is defined as specific domain ontology concept reuse in some other domain 
applications. The context Transport generated ontology concept reuse 40%. The context 'road traffic' 
generated ontology concept reuse 35%. 


Table 2. Contextual knowledge reuse 


Context Percentage of ontology concept reuse 
Aircraft 8 
Airport 22 
Ambulance 19 
Bicycle 26 
Bike 31 
Bus 33 
Car 28 
Cargo Ship 11 
Crane 18 
Electric Vehicle 22 
Flight 8 
GPS 27 
Parking 27 
Rapid Transit 29 
Road Traffic 35 
Train 19 
Transport 40 
Vehicle 26 


Figure 7. Illustrates knowledge extraction for context methodology [25] generated ontology concepts 
percentage based on Transportation relation IoT applications ontological concepts. The context Airport, 
Ambulance, Bicycle, Crane, and Train reserves the 4% of ontological concepts. The context word Bus, Car, 
Electric Vehicle, Rapid Transit, and Vehicle reserve 6% of ontological elements. The context bike reserves 
5% of ontological concepts. The context parking reserves 7% of ontological concepts and the context GPS 
reserves 8% of ontological concepts. The context cargo ship and flight reserves 2% of ontol ogical concepts. 
The context Transport reserves the highest count of ontological concepts. The context of road traffic reserves 
the second-highest count of ontological concepts. 

Figure 8. shows knowledge expansion methodology generated ontology concepts percentage based 
on dominant words. The context aircraft, crane, bike, bus, and electric vehicle reserves 4% of ontology 
concepts. The context car and road traffic reserve 5% of ontology concepts. The context airport, and GPS 


Augmentation of contextual knowledge based on domain dominant words for IoT ... (Prakash Shanmurthy) 


510 o ISSN: 2502-4752 
reserves 6% of ontology concepts. The context flight, train, rapid transit, parking, and transport reserves 7% 
of ontology concepts. The context ambulance and bicycle reserve 3% of ontology concepts. The context 
cargo ship has the second-highest percentage (8%) of ontology concepts. The context vehicle has the highest 
percentage (9%) of ontology concepts. The aircraft and flight contextual ontology concept reuse 8% which 
shows mentioned IoT application are compared to the context of transport and road traffic is very less. 
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Figure 7. Generation of ontology concepts based on 
ontological inputs 


Figure 8. Generation of ontology concepts based on 
dominant words inputs 


Figure 9. describes the percentage of ontology match between generated ontology concepts and 
popular IoT applications ontologies like SSN ontology, LOV4IoT, M3 ontology. The knowledge extraction 
methodology [25] produces 16300 concepts that support more interoperability. The knowledge expansion 
methodology 1422 contextual concepts generated, but this method also supports reuse of ontology concepts. 
In the case of ontology concept reuse is higher based on a hybrid methodology that is knowledge extraction 
methodology and knowledge expansion methodology explored concepts concerning ontological elements and 
dominant terms. It gives maximum interoperability in the various context of transport-related IoT application 
domains. 
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Figure 9. IoT application reusability based on ontological inputs and dominant words inputs 
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6. CONCLUSION AND FUTURE WORK 

The IoT application-related ontologies are identified from the ontology catalogue and repository. 
Some of the IoT applications ontologies are publicly unavailable or inaccessible. Those IoT application- 
related research article frequent terms are identified and that are passed into word2vector NLP corpus to 
generate similar terms which are annotated in ontology as an ontology concept. Then these concepts are 
added into existing clustered ontological terms which produce a maximum of 40% reusability for the 
transport domain. In the future, the developed ontology concepts can be converted into a knowledge graph 
that can be used for communication, decision making, and notification generation. 
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